Application of matrix variate Gaussian mixture model to statistical voice conversion
نویسندگان
چکیده
This paper describes a novel approach to construct a mapping function between a given speaker pair using probability density functions (PDF) of matrix variate. In voice conversion studies, two important functions should be realized: 1) precise modeling of both the source and target feature spaces, and 2) construction of a proper transform function between these spaces. Voice conversion based on Gaussian mixture model (GMM) is the de facto standard because of their flexibility and easiness in handling. In GMM-based approaches, a joint vector space of the source and target is first constructed, and the joint PDF of the two vectors is modeled as GMM in the joint vector space. The joint vector approach mainly focuses on precise modeling of the ‘joint’ feature space, and does not always construct a proper transform between two feature spaces. In contrast, the proposed method constructs the joint PDF as GMM in a matrix variate space whose row and column respectively correspond to the two functions, and it has potential to precisely model both the characteristics of the feature spaces and the relation between the source and target spaces.
منابع مشابه
Using Context-based Statistical Models to Promote the Quality of Voice Conversion Systems
This article aims to examine methods of optimizing GMM-based voice conversion systems performance in which GMM method is introduced as the basic method for improvement of voice conversion systems performance. In the current methods, due to using a single conversion function to convert all speech units and subsequent spectral smoothing arising from statistical averaging, we will observe quality ...
متن کاملMatrix-Variate Beta Generator - Developments and Application
Matrix-variate beta distributions are applied in different fields of hypothesis testing, multivariate correlation analysis, zero regression, canonical correlation analysis and etc. A methodology is proposed to generate matrix-variate beta generator distributions by combining the matrix-variate beta kernel with an unknown function of the trace operator. Several statistical characteristics, exten...
متن کاملExemplar-based voice conversion using non-negative spectrogram deconvolution
In the traditional voice conversion, converted speech is generated using statistical parametric models (for example Gaussian mixture model) whose parameters are estimated from parallel training utterances. A well-known problem of the statistical parametric methods is that statistical average in parameter estimation results in the over-smoothing of the speech parameter trajectories, and thus lea...
متن کاملMany-to-many voice conversion based on multiple non-negative matrix factorization
We present in this paper an exemplar-based Voice Conversion (VC) method using Non-negative Matrix Factorization (NMF), which is different from conventional statistical VC. NMF-based VC has advantages of noise robustness and naturalness of converted voice compared to Gaussian Mixture Model (GMM)based VC. However, because NMF-based VC is based on parallel training data of source and target speake...
متن کاملEsophageal Speech Enhancement Based on Statistical Voice Conversion with Gaussian Mixture Models
This paper presents a novel method of enhancing esophageal speech using statistical voice conversion. Esophageal speech is one of the alternative speaking methods for laryngectomees. Although it doesn’t require any external devices, generated voices usually sound unnatural compared with normal speech. To improve the intelligibility and naturalness of esophageal speech, we propose a voice conver...
متن کامل